The impact of the suppression of highly connected protein interactions on the corona virus infection

Several highly effective Covid-19 vaccines are in emergency use, although more-infectious coronavirus strains, could delay the end of the pandemic even further. Because of this, it is highly desirable to develop fast antiviral drug treatments to accelerate the lasting immunity against the virus. From a theoretical perspective, computational approaches are useful tools for antiviral drug development based on the data analysis of gene expression, chemical structure, molecular pathway, and protein interaction mapping. This work studies the structural stability of virus–host interactome networks based on the graphical representation of virus–host protein interactions as vertices or nodes connected by commonly shared proteins. These graphical network visualization methods are analogous to those use in the design of artificial neural networks in neuromorphic computing. In standard protein-node-based network representation, virus–host interaction merges with virus–protein and host–protein networks, introducing redundant links associated with the internal virus and host networks. On the contrary, our approach provides a direct geometrical representation of viral infection structure and allows the effective and fast detection of the structural robustness of the virus–host network through proteins removal. This method was validated by applying it to H1N1 and HIV viruses, in which we were able to pinpoint the changes in the Interactome Network produced by known vaccines. The application of this method to the SARS-CoV-2 virus–host protein interactome implies that nonstructural proteins nsp4, nsp12, nsp16, the nuclear pore membrane glycoprotein NUP210, and ubiquitin specific peptidase USP54 play a crucial role in the viral infection, and their removal may provide an efficient therapy. This method may be extended to any new mutations or other viruses for which the Interactome Network is experimentally determined. Since time is of the essence, because of the impact of more-infectious strains on controlling the spread of the virus, this method may be a useful tool for novel antiviral therapies.

The current health emergency caused by the SARS-CoV-2 infection has prompted worldwide efforts to develop an antiviral treatment against Covid-19. The development of antiviral drugs requires an urgent, in-depth understanding of host-coronavirus protein-protein interactions. Many improvements in the SARS-CoV-2 interactome have been studied recently, but a disease treatment still remains elusive. Moreover, despite the genome sequences of the SARS-CoV-2 are quite similar to SARS-CoV-1 and MERS-CoV, and there are currently several vaccines in emergency use authorizations, there is no effective antiviral drug treatment yet. Current vaccines can be less effective against new variants of SARS-CoV-2 that could spread more quickly, develop more severe disease, or be capable of evading diagnostics. On the other hand, antiviral drugs can be easily administrated, possibly transported without a cold chain and at low cost. In fact, the US government invests $18.5 billion into vaccines but $8.2 billion in antiviral drugs development because it has not yet identified a highly effective drug to treat or prevent the Covid-19 infection (https:// www. nytim es. com/ 2021/ 01/ 30/ health/ covid-drugs-antiv irals. html).
Previous works focused mainly on viral protein properties within the host-virus PPI network by studying the PPI network's structural properties, local connectivity distribution, and cluster formation [43][44][45][46]48,49 . We propose an alternative approach based on the complexity and tolerance of the virus/protein-host/protein interaction network. This approach builds on the assumption that viral infection can be graphically represented by a network, where each node represents a virus-host protein interaction, and the edges correspond to proteins involved in each of those interactions. In virus-host systems, host-host protein interactions are commonly affected by viral proteins since, frequently, proteins involved in a host-host interaction also take part in a virus-host interaction [50][51][52] . Thus, the protein-protein connection redundancy leads to a robust virus-host network, which is unaffected by the host/protein network variations 53,54 .
On the other hand, during the viral infection, pathogen proteins mutate, changing the structural and topological properties of the network 50,55,56 . To reduce the statistical bias, we simulate the viral evolution starting from a fully connected virus-host network, where proteins interact with each other only if they are in physical contact Therefore, the schematic representation of the virus-host interaction is not affected by the virus/protein network and host/protein network structures. " During the viral infection, pathogen and cellular proteins compete for binding partners changing their protein-protein network structure 50,51 . For instance, mutations at the protein interfaces change the "protein electrostatics and structural properties" 50 . Viral infection evolution can be represented as a protein-protein interactions network where patterns of interactions encode complex biological processes 56,57 , and statistical methods can be used for drug target identification 58 .
The efficiency of virus transmission, replication and proliferation can be identified from the effect on the connectivity of this network caused by a virus-host protein removal. The idea is that identification of the main virus-host interactions is therefore the key for the development of antiviral drugs.
In network science many systems exhibit tolerance against errors. The ability to maintain interactions or communication, notwithstanding the structural changes caused by removing nodes, arises from redundant interconnections. Many networks in nature (or real networks) manifest high tolerance against local failures, but there are still target nodes whose removal causes a significant global impact on the network's structural The method. We built a virus/protein-host/protein interaction network based on a public PPI database as follows, each PPI consists of one viral protein and one host protein; if two PPI share a common virus/protein or a host/protein, then they are connected. All the information of viral infection is encoded into the virus/proteinhost/protein interaction network architecture. The removal of a specific protein produces a structural change in the network. We set the average connectivity (see Supplementary Information Section Fig. 9) as a proxy of the network tolerance against the deletion of proteins. We identify the target viral proteins as the proteins whose removal causes a significant variation of the network's average connectivity.
Graph theory. Virus/protein-host/protein interaction network are composed by a set of interactions I i , I j , . . . , I k , I l , which in turn indicates the interaction between a virus protein and a host protein, as shown in Fig. 1A. Figure 1B Two interactions are connected if they share a common virus protein or a host protein, if they share both proteins then they are the same virus-host interaction. For instance, the interaction I i , and I j share the E protein so they are connected (Fig. 1C). We construct a graph network with the connected interaction, and we obtain a hierarchical cluster decomposition of the interactions where each cluster is located a specific cellular compartment except the SARS-CoV-2 largest cluster which is formed by interaction with different cellular localization (Fig. 1D). When a single virus protein or host protein is removed all the links that contain it disappear, while if a single interaction is removed all its links are drop out.

Validation.
A multiply connected random network is formed by clusters which are connected by multiple alternative connections. An important way to characterize such a network is given by the average connectivity of clusters as a function of the number of nodes (or interactions in our case). For a randomly fully connected network this graph has a slope of 1 and any deviation from this reflects peculiarities and biases of the interactions and/or connectivity which is an intrinsic feature of real networks 61,62 . Hepatitis and Ebola virus network architectures are composed of a few fully connected nodes and thus our methods are not applicable. Application of this concepts to the well-known H1N1, and HIV virus (using the public PPI database http:// virus es. string-db. org/ with a scoring > 0.7) reveals interesting systematics. The H1N1 ( Fig. 2A) and HIV (Fig. 2B) virus/protein-host/protein interaction network (see Supplementary Information Section Fig. 2) displays few small clusters (empty circle marks) and one large cluster (black dot). Figure 2 shows the average connectivity of each cluster as a function of the number of PPI. The dashed line is a line with slope of 1 expected from a fully www.nature.com/scientificreports/ randomly interconnected network 61,62 . The small clusters are on this line, however the largest cluster (black dot, for each virus) are completely off the line. This indicates that the large cluster is weakly connected to the rest of the network and therefore elimination of a few connections may produce a major disruption of network connectivity.
In Fig. 2 we show the effect of the removal of specific proteins on the position of the largest cluster on this graph. This is a measure of the tolerance of the network against PPI removal by means of the average connectivity variation of the largest cluster. Figure 2 shows that removal of some proteins from the network moves the position of the largest cluster closer to the expected random network curve. Interestingly all the predicted target proteins (i.e., removed proteins producing a significant change in the largest cluster connectivity) for the H1N1 virus are already used in vaccines: HA (blue) attenuated vaccine strains, NA (cyan) conserved epitopes, M (red) ectodomain based vaccines, PB2 (orange) and NS1 (magenta) live attenuated vaccine strains 1,63,64 . Figure 2B shows the same for the HIV virus. Figure 2B displays the following target proteins: F (cyan); vif (blue); IL10 (magenta); CREB1 (orange); JUN (green); vpr (red). Inactivated antigen approach to influenza vaccination promotes an immune response against the viral surface glycoproteins HA and neuraminidase NA 1,63,64 , while live attenuated influenza vaccine targets M, PB2, and NS1 63,65-68 . Our method predicted those proteins as key targets which disrupt the influenza network structure.
Our method identifies three key proteins that play a significant role in the efficacy of the vaccination against HIV-1, Vpr, vif, and CREB1. These play a crucial role in the development of therapeutic interventions as described in references 2,69,70 . Our method does not identify other protein vaccine candidates, such as multiple gp120 envelope proteins 71 , because the statistical weight associated with protein removal relies on the number of connections and the effective long-range correlation between virus-host interactions.
Our findings indicate that a study of the viral/protein-human/protein interaction in the H1N1 and HIV is able to identify important proteins used in vaccines. Considering that time is of essence, this calls for application of this method to other important viral systems although further studies will be needed as experimental data becomes available.
Geometrical descriptors of structural changes of the network like the largest cluster average connectivity and the number of PPI capture the impact of highly connected nodes. This statistical description is sensitive to the number of disconnected links, which accounts for effective long-range interactions. Therefore, we are not able to account for targets which depend on long range biological interactions. However, by evaluating different scoring methods and using many protein interactions, we decrease the statistical bias and discrepancies between predicted proteins based on biological mechanisms and statistical methods.

Host-coronavirus protein interaction network.
Recently, a map of virus-host protein for MERS-CoV, SARS-CoV-1, and SARS-CoV-2 has been published 44,45 , including a high-score update and additional statistical updates of mass spectrometry and PPI scoring information 44,45 , reporting more than 300 high-confidence interactions for these coronaviruses. A protein-interaction-based network analysis based on this data set has not yet been carried out, to the best of our knowledge. Moreover, the networks tolerance against antiviral attack has not been reported. Within this context, we study the tolerance of MERS-CoV, SARS-CoV-1, and SARS-CoV-2 networks to removal of a single protein [see Supplementary Information (SI), Figs. [3][4][5]. We construct a proteininteraction-based network for each of these coronaviruses by identifying virus-host interactions as nodes interacting with each other by sharing a virus or a host protein (see Fig. 3). According to the information provided by Krogan's lab group 44,45 , we use a high-confidence interactome to create host-coronaviruses networks for each of these viruses (it is important to emphasize that our analysis focuses on the architecture of the Interaction Network instead of the Proteins Network). Figure 3 shows that all networks display a hierarchical cluster structure; www.nature.com/scientificreports/ MERS-CoV and SARS-CoV-1 share a similar structure characterized by a largest cluster formed by two subclusters see Fig. 3. In contrast, the SARS-CoV-2 largest cluster is formed by four sub-clusters, which suggests that the evolution from MERS-CoV to SARS-CoV-2 is closely linked to the largest cluster topology. By using the cellular localization analysis provided by Krogan's lab group 44,45 , we apply a color-code [diffuse cytoplasm (red); endoplasmic reticulum (ER) (green), plasma membrane (PM); (blue); endosomes (cyan); Golgi (yellow); mitochondria (magenta); no information (black)] for the viral/protein localization, as shown in Fig. 3. In the MERS-CoV and SARS-CoV-1 network, the largest cluster appears mainly located in the diffuse cytoplasm. Meanwhile in SARS-CoV-2, the largest cluster spreads across the diffuse cytoplasm, endoplasmic reticulum, and plasma membrane. Therefore, our study reveals that the hallmark of the SARS-CoV-2 infection is a highly connected virus/protein-host/protein interaction network across the entire host cell, which may be the key to its efficient infection mechanism. These results indicate that the protein-interaction-based network exhibits a hierarchical cluster structure which is highly correlated with cellular localization. Viral/proteins interact with www.nature.com/scientificreports/ host/proteins while in direct physical contact. However, we find that virus-host interactions which apparently are biologically disconnected maybe linked indirectly throughout the virus/protein-host/protein interaction network. This maybe a very important feature of the SARS-CoV-2 hijacking of host cell regulation. The virus/protein-host/protein interaction graph network is built on virus-host interactions mediating cellular functions and viral infection, which are connected if they share a common single virus/protein or host/ protein. Figure 3 shows the MERS-CoV, SARS-CoV-1, and SARS-CoV-2 virus/protein-host/protein interaction network, where virus/protein-host/protein interactions are represented as square marks and the links between them indicate that two interactions share a common single virus/protein or host/protein. Affinity purification mass spectrometry and structural properties of protein-protein complexes allow the identification of these protein interactions. However, only a few of these interactions play a relevant biological role [72][73][74] . We compare several statistical methods to score the PPI and to create the virus/protein-host/protein interaction network (see Supplementary Information Figs. 7, 8).
To compare the infection mechanism of MERS-CoV, SARS-CoV-1, and SARS-CoV-2 we construct their virus/ protein-host/protein interaction network. To quantify the effect of virus/proteins on the virus/protein-host/ protein interaction, we investigate the variation of average connectivity by removing some of these elements. We use the change of the structural properties of the network as a proxy to simulate potential damage produced by antiviral activity. Figure 4A shows that the highly pathogenic coronaviruses MERS-CoV, SARS-CoV-1, and SARS-CoV-2 display an almost linear relationship between the number of PPI and the average connectivity per cluster. Since the average connectivity of these small clusters increases with the number of PPI there is a homogenous connectivity distribution rather highly connected virus-host interactions or hubs. However, the largest clusters display key virus-host interactions, regardless of the small number of connections they have, that when they are removed several virus-host subcluster are disconnected. For this reason, the largest cluster plays a more central role in the structure of the virus-host interaction network. The largest clusters have around 40-60 PPI for MERS-CoV and SARS-CoV-1, however, SARS-CoV-2 has twice as many, see Fig. 4A.

Coronavirus-host protein network analysis.
Due to the urgency of the ongoing worldwide health emergency caused by the Covid-19, we focus on the tolerance against protein removal of SARS-CoV-2; similar analysis for the MERS-CoV, and SARS-CoV-1 are addressed in the Supplementary Information (SI) Section, Figs. 3-5. Figure 3B shows that removal of most virus proteins does not affect the linear dependence of small clusters. However, removal of nsp4, nsp12, nsp13, and nsp16 proteins, reduce substantially the number of PPIs bringing the largest cluster closer to the linear dependence. The disruptive effect of antiviral drugs affects mostly the host/protein-virus/protein interaction rather than a single viral protein. Therefore, a more useful approach is to investigate the virus-host network tolerance against single virus-host interaction removal.

SARS-CoV-2 target protein interaction.
The SARS-CoV-2 network retains its structural properties against protein removal except when nsp4, nsp12, nsp13, nsp16, are removed (see Fig. 4B and Supplementary Information (SI) section, Figs. [3][4][5]. The comparison between MERS-CoV, SARS-CoV-1, and SARS-CoV-2 implies that common proteins are located in similar places and thus expected to perform the same functions. The exception is nsp13, which change its location from virus to virus thus expected to exhibit different functions and therefore is not included in our analysis 44,45 . Moreover, we found that the set of virus-host interactions caus- www.nature.com/scientificreports/ ing a significant change in the virus-host network appears in a small cluster linking the nsp4, nsp12, and nsp16 proteins, which belong to different complexes (Fig. 5). This provides a target which produces significant changes in the virus-host network and thus is the optimal candidate for an antiviral drug attack.
Comparative scoring methods analysis. We use four protein-protein interaction scoring methods, high confidence, MIST, Saint, and K, as published on 44,45 . All these scoring methods reveal the importance of the SARS-CoV-2 largest cluster on the virus-human interaction. MIST and Saint scoring methods are based on experimental biological data ranking the confidence of the virus/protein-host-protein interactions between 0 and 1, where 1 indicates the maximum confidence value (we selected PPI with a confidence ≥ of 0.6). Combining the MIST and Saint scoring method with protein complexes information, Krogan's lab group introduce a high confidence score 44,45 . Finally, the K score is defined as the average between the MIST and Saint score 44,45 . In this case, we also use a confidence score ≥ 0.6. We identify critical targets by removing a PPI that produce a significant variation on the SARS-CoV-2 largest cluster connectivity. As shown in Table 1, all the scoring methods identify the nsp4/NUP210 and nsp16/NUP210 interactions as critical targets. The nsp12/USP54 interaction is detected only by the High confidence scoring method since the nsp12 protein is not included in the MIST, Saint, and K dataset. This comparative analysis reveals that our method allows us to identify crucial PPI in good agreement with different scoring methods; thus, our approach is a robust predictive method (see Table 1).  Table 1. Comparative analysis of the different scoring methods. Comparative analysis of high confidence, MIST, Saint, and K scoring method. We identify the PPI that produce a significant variation of the SARS-CoV-2 largest cluster connectivity. Empty boxes indicate non identified PPI. The nsp4/NUP210 and nsp16/ NUP210 interactions emerge as common key targets in all scoring methods. The nsp12/USP54 interaction appears as a key target only when we use the high confidence scoring since this protein is not included in MIST, Saint, K data set. This analysis reveals that our method allows us to detect key PPI independently of the statistical scoring method. Significant texts are in bold.

Discussion
We developed a new, fast detection method of the key controlling virus/protein-host/protein interactions which identifies the SARS-CoV-2 infection mechanism. An analysis of the protein-interaction-based network provides a universal hierarchical cluster decomposition from three coronaviruses-host networks. This cluster decomposition identifies all virus-host interactions sharing a virus and host proteins and localizes them in different cellular compartments. We find that the SARS-CoV-2 largest cluster structure extends across the entire host cell, revealing an enhanced hijacking of the host cell regulation mechanism. The evolution of the largest clusters coincides with new coronavirus strains' suggesting that further mutational changes will occur within this structure. We verified the validity of our method by applying it to H1N1 and HIV viruses, in both cases we detect important target proteins which are used in antiviral drug development.
We simulate an inhibitory antiviral action on SARS-CoV-2 by removing a protein and virus/protein-host/ protein interaction. This process shows that removal of the nonstructural protein nsp4, nsp12, nsp13, nsp16, produces a significant change in the structural properties of the virus-host network resembling the behavior of previous coronaviruses, MERS-CoV, and SARS-CoV-1. Furthermore, we find that this effect is supported exclusively by a small set of virus-host interactions linking nsp4, nsp12, and nsp16 proteins. Moreover, our protein-interaction-based network method does not depend on the scoring method for protein-protein interactions. Interestingly our cluster decomposition coincides with the previously reported clustering obtained from genetic analysis, of these three coronaviruses.
The SARS-CoV-2 infection spreads a viral interaction network across different intracellular compartments, which strongly suggests that this network's largest cluster encodes relevant mechanisms of viral infection. Our analysis highlights the significance of long-range proteins interactions resulting from emerging and collective behavior of the virus-host interactome, which may be inspired by the seeking of a new biological mechanism for viral infection. In a global health emergency, it is crucial to develop a fast viral mechanism characterization method. Our analysis based on the geometric properties and tolerance of a virus/protein-host/protein interaction network is a valuable tool for understanding the viral mechanism of new variant strains of SARS-CoV-2 and detecting target protein-protein interaction rapidly. Our insights may provide essential information for further antiviral drug development, uncover the role of nonstructural viral/proteins, and identify the importance of a small set of virus-host interactions. Furthermore, structural information of the virus/protein-host/protein interaction networks may help understand the spike protein function in the mutations of the SARS-CoV-2.
The conventional methods based on network analysis, map the virus-host interaction into a graph between the virus and human proteins where connections involve biological interactions. On the other hand, we use an interaction-based network to explore its structure and the impact caused by removing certain proteins. It is our hypothesis that this simulates the antiviral action and brings a new perspective regarding viral infection. We validate this hypothesis by applying it to the known effects of protein removal on the H1N1 and HIV viruses. This methodology applied to the SARS-CoV-2 infection, predict that removal NUP210 and USP54 host proteins, produce a major change in the connectivity of the SARS-CoV-2 interaction network. These are known, to play an important role in cancer therapy and autoimmune diseases 75,76 . Although, nonstructural proteins (nsp4, nsp12, nsp16) have not attracted attention as target proteins, our validated method predicts them to be important candidates for antiviral drug and vaccine development. According to available connectivity data the SARS Cov2 spike proteins are weekly connected to the whole network. Because of this removing them as proposed by our method, does not affect in a major way the network structure and connectivity. On the other hand, in the case of H1N1 where more connectivity data is available, we properly predict which proteins are targets 77 . The spike proteins, the glycoproteins HA, and neuraminidase proteins NA are located on the viral surface, and they are also the most variable viral proteins. Protein variability results in highly connected nodes because SARS-CoV-2 variants and mutations potentially change viral properties. The identification of spike protein is restricted by available information in this work. Notwithstanding the preceding, our method has been able to identify crucial target proteins for H1N1 and HIV viruses and revealed the role of non-structural proteins, as has been reported by other authors [62][63][64][65][66][67][68] .
Predicted targets proteins are validated by Live attenuated and Inactivated antigen vaccines methods. Similarly, standard antiviral therapy mainly uses inhibitors to prevent the spike and nucleocapsid proteins binding 78,79 . However, it has been reported that the non-structural proteins play a significant role in the virulence of the SARS-CoV-2 virus 80 as our method predicts.
Molecular docking study of ivermectin, and remdesivir indicates spike, M, N, nsp14, and nsp16 as viable targets for drug treatment development of SARS-CoV-2 81,82 , while Raltegravir and Maraviroc are potential candidates to inhibit nsp-16 protein 83 . Multi-targeting features of Diosmin display highest binding affinity and inhibitory action of several non-structural proteins (nsp3, nsp9, nsp12, nsp15) 84 . This confirmation of the role of non-structural proteins in treatment validates our target proteins predictions.
Finally, our approach of graphically visualizing the connectivity of interactome networks overlaps with the design and optimization of neural networks, and as such our method could pave a roadmap to combine neuromorphic computing and artificial intelligence to optimize the design of drug treatments of viral diseases.

Materials and methods
Dataset. We obtain the protein sequence similarity, high-confidence protein interaction, and localization of the virus protein from a public dataset and permission as reported by Krogan's lab group 44,45  www.nature.com/scientificreports/ Graph theory methods. A virus/protein-host/protein interaction network G(I, k) is a set of I 1 , I 2 , . . . , I D virus-host interactions with k 1 , k 2 , . . . , k D connections between them, as described in Fig. 1. Graph representation of this network consists of D-nodes where each node corresponds to a one virus-host interaction. If two interactions share a common virus/protein or a host/protein, then they are connected by a link. The number of connections of the i-th virus-host interaction I j is k j . The graphs were constructed from binary matrices where entries correspond to all virus/protein-host/protein interactions. Undirected links indicate connected virushost protein interactions. The adjacency matrix A ij , with i, j = 1, 2, . . . , D , encodes all these interactions, with A ij = 1 indicating that the i-th and j-th interaction are connected, and A ij = 0 otherwise, Fig. 1. Notice that the connection of the i-th interaction k i = D j=1 A ij . According to cluster decomposition, the adjacency matrix of full connected interactions can be expressed as sum of N cluster adjacency matrices where a(L) ij stands the adjacency matrix of the L-th cluster. To quantify the network tolerance against local failures due to the removal of a virus protein or single virus-host interaction, we introduce the average connectivity as a proxy. This way, the cluster average connectivity is Here n L is the number of PPIs of the L-th cluster. Notice that the removal of a virus protein implies that all its interactions are deleted. In this case we have a network composed by D-1 interactions with D-1 connections.
Network's tolerance against the antiviral activity. We simulate the inhibitory action of antiviral drugs utilizing node removal. The features of the cluster decomposition, in particular, the properties of its larger cluster, change as virus proteins or virus-host interactions are deleted. We use the cluster average connectivity variation as a proxy to quantify the impact of virus proteins in the virus-host interaction networks, as detailed in the Supplementary Information (SI